square loss
- North America > United States (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
A Proof of Theorem 1 Proof
Theorem 6 is stated in terms of Gaussian complexity. Ben-David (2014) has a full proof. M (α)null is the linear class following the depth-K neural network. The second term relies on the Lipschitz constant of DNN, which we bound with the following lemma. Similar results are given by Scaman and Virmaux (2018); Fazlyab et al. (2019).
- Europe > Italy > Lombardy > Milan (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
Understanding Square Loss in Training Overparametrized Neural Network Classifiers
Deep learning has achieved many breakthroughs in modern classification tasks. Numerous architectures have been proposed for different data structures but when it comes to the loss function, the cross-entropy loss is the predominant choice. Recently, several alternative losses have seen revived interests for deep classifiers. In particular, empirical evidence seems to promote square loss but a theoretical justification is still lacking. In this work, we contribute to the theoretical understanding of square loss in classification by systematically investigating how it performs for overparametrized neural networks in the neural tangent kernel (NTK) regime.
Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals. In the former problem, given labeled examples $(\bx, y)$ from an unknown distribution on $\R^d \times \{ \pm 1\}$, whose marginal distribution on $\bx$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with 0-1 loss $\opt+\eps$, where $\opt$ is the 0-1 loss of the best-fitting halfspace. In the latter problem, given labeled examples $(\bx, y)$ from an unknown distribution on $\R^d \times \R$, whose marginal distribution on $\bx$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with square loss $\opt+\eps$, where $\opt$ is the square loss of the best-fitting ReLU. We prove Statistical Query (SQ) lower bounds of $d^{\poly(1/\eps)}$ for both of these problems. Our SQ lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
"NIPS Neural Information Processing Systems 8-11th December 2014, Montreal, Canada",,, "Paper ID:","1461" "Title:","The limits of squared Euclidean distance regularization" Current Reviews First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper considers the problem of empirical risk minimization with squared distance regularization, which results in a weight vector that is a linear combination of the training examples. The authors prove a linear lower bound on the average square loss of the algorithm on random problems, provided the loss function is nice enough, while the same problem is easy to learn by another algorithm. This is a well-written paper on a simple idea and result, with a rather interesting interpretation. The proposed conjectures on random features and neural networks should be fleshed out in more detail, or at least with more empirical evidence.
- Europe > Italy > Lombardy > Milan (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)